1. Field of the Invention
The present invention relates in general to computers, and more particularly to methods, computer systems, and computer program products for deduplicating data.
2. Description of the Related Art
Over time, data deduplication engines are presented with multiple similar copies of the same data. Unfortunately, the method of presentation is usually done using a back-up tool or application. Almost always, the back-up application adds its own metadata (block and file headers, for example) in an overlay over the underlying user data that is being backed up. At best, this overlay causes minor interference to the algorithms of the deduplication engine, but sometimes this overlay is so detrimental that deduplication efficiency is marginal.
One commonly employed method is to preprocess the back-up data in order to remove the overlay or enough of the overlay to minimize the interference. Some common back-up tools reorder the user data as it is sent to the back-up media. This reordering of the data breaks the matching of the user data that underlies the application overlay, causing poor deduplication.